Skip to content

Recover PG-vendored C types collapsed to int during header parsing#15

Merged
Nyuke235 merged 1 commit into
MobilityDB:masterfrom
estebanzimanyi:fix/recover-collapsed-c-types
May 26, 2026
Merged

Recover PG-vendored C types collapsed to int during header parsing#15
Nyuke235 merged 1 commit into
MobilityDB:masterfrom
estebanzimanyi:fix/recover-collapsed-c-types

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

Problem

Some MEOS header sets reach libclang with bool / int64 / Timestamp /
TimestampTz / H3Index already collapsed to int (or int *) at the
preprocessor level — the real type name is gone before parsing, so it cannot be
recovered from the AST. The extracted IDL then carries int where the source
says one of those types, and downstream binding generators mis-map them:

  • boolint (should be a boolean)
  • int64 / H3Indexint (should be 64-bit / long)
  • TimestampTz * out-param → int * — generators size the result buffer at
    4 bytes for an 8-byte native write (a buffer under-allocation; observed as
    IndexOutOfBounds / native-heap corruption in a JMEOS consumer).

Fix

A post-parse pass (parser/typerecover.py) that recovers these from the raw
header declaration text (which still spells the real type) and rewrites the
IDL entry. Wired into run.py right after parse_all_headers.

It is idempotent and a no-op on correctly-parsed headers: it only rewrites a
type that is currently "int" / "int *" and whose header declaration spells
a recoverable type. Genuinely-int functions (e.g. intspan_width) are left
untouched.

Validation

  • Unit test over scalar bool/int64/TimestampTz*/H3Index returns and params,
    plus a genuine-int control: all recovered correctly, control unchanged,
    re-run is a 0/0 no-op (idempotent).
  • Parses the real MEOS headers cleanly (3548 extern decls).

This makes the IDL — and any binding regenerated from it — reproducible without
external post-processing scripts.

The host-symbol-collision build prefix-renames PG types and the parse
lacks pg_config.h, so opaque PG-vendored types reach libclang already
macro-collapsed and are spelled int / int * / int ** in the parsed IDL.
This post-parse pass recovers each from the header declaration text,
preserving const / pointer levels, and only when the function's parsed
type actually collapsed to int.

Recovered base types: bool, int64, Timestamp(Tz), H3Index, text,
GSERIALIZED, Interval, DateADT, Datum, size_t, GBOX, BOX3D, AFFINE.
Audited against a correct-typed reference IDL: zero int*-where-a-named-
pointer-belongs mismatches remain, so every binding that codegens from
the catalog gets the real types (e.g. tcbuffer_convex_hull -> GSERIALIZED *,
temporal_tprecision(..., const Interval *, ...)).
@Nyuke235
Copy link
Copy Markdown
Collaborator

Approving and merging

Empirical verification

I cross-checked this end-to-end via MobilityDB/MEOS.js#2 (which carries an IDL regenerated against this branch):

  • Cloned the refresh branch, ran npx tsc --noEmit against the regenerated bindings -> 0 errors
  • Spot-checked the IDL on functions that were previously collapsed:
    • tbool_value_at_timestamptz: ret=bool, params=(..., bool *value)
    • span_eq: ret=bool
    • intset_value_n: ret=bool
    • temporal_timestamps: ret=TimestampTz *
  • Confirmed reproducibility: regenerating bindings.c + functions.generated.ts from the recovered IDL produced bit-for-bit identical files to what was committed.

For context, without this fix, the same regen produces 234 TypeScript errors (123 number <-> boolean, 81 boolean <-> number, 30 arity mismatches from bool * out-params). So the empirical gain is exactly what the description promises.

Relationship to #1 (stdbool-stub)

This PR functionally supersedes #1: it handles the same scalar collapses (bool, int64, Timestamp*) plus the more disruptive text / GSERIALIZED cases that #1 doesn't address at all. Approach is different (post-parse text recovery vs. preprocessor stubbing) but the coverage here is a strict superset and the test discipline is in another league.

Suggesting we close #1 as superseded once this lands, running both would mean two layers fighting over the same slots, and the preprocessor stub approach is the more brittle of the two (drift any time MEOS adds a typedef).

Approve and merge from my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants